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DETAILED ACTION 
Response to Arguments 

1 . Applicant's arguments filed 5/27/2004 have been fully considered but they 
are not persuasive. 

As per claims 1 and 8, the applicant argues that prior art of record fails to 
disclose that " the impact value of a particular speech item is based upon how 
descriptive and/or importarit a word is " (Amendment page 3), " the step of 
determining a desired inflection for each speech item in the sequence of speech 
items based on the syllable count and the impact value for a particular speech 
item " (Amendment page 4), and " the method for convening text to concatenated 
voice as described bv Applicants' independent claim 1 " (Amendment page 4). 
However, prior art of record discloses all the limitations mentioned above: "the 
impact value of a particular speech item is based upon how descriptive and/or 
important a word is" {the COST FUNCTION sections on col. 12-15, how well the 
a speech unit candidate fits with its neighbor speech unit candidates explains 
how descriptive and/or important that candidate is), "the step of determining a 
desired inflection for each speech item in the sequence of speech items based 
on the syllable count and the impact value for a particular speech item" {col. 23, 
In. 35-67 counting syllable, and col. 13, In. 39 to col. 14, In. 67, inflection is 
determined based on how well each candidate units concatenated together in 
term of their pitch), and "the method for converting text to concatenated voice as 
described by Applicants' independent claim 1" {figure 1). 
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As per claim 2, the applicant argues that prior art of record fails to disclose 
that ''Jacks fails to disclose or suggest determining an impact value for each 
speech item and the step of determining a desired inflection for each speecfi item 
in the seguence of speech items based on the syllable count and the impact 
value for the particular speech item ". However, Coorman et al. already taught 
the step of "determining an impact value for each speech item and the step of 
determining a desired inflection for each speech item in the sequence of speech 
items based on the syllable count and the impact value for the particular speech 
item" as mention in claim 1 above. 

Therefore, claims 1-15 remain rejected over prior art of record. 

Allowable Subject Matter 

\(,'20 are. 

2, Claims Ifeffi allowed over prior art of record. 

3. Regarding claim 16, Coorman et al. disclose a method for converting text 
to concatenated voice by utilizing a digital voice library and a set of playback 
rules, the method further comprising: determining a syllable count for each 
speech item in the sequence of speech items (co/. 23, In. 35-45 and coL 23, In. 
35-67)\ determining an impact value for each speech item in the sequence of 
speech items {the COST FUNCTION sections on col 12-15 explained in claim 1 
in the response to argument section above)] determining a pitch value within a 
range for each speech item in the sequence of speech items by normalizing the 
impact value for the particular speech item {col. 13, In. 48-53)] determining a 
desired inflection for each speech item in the sequence of speech items based 



I^nr 
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on the syllable count and the pitch value for the particular speech item and 
further based on the set of playback rules {col. 23, In. 35-45 and col. 23, In. 35-67 
and the Cost Function in col. 12-15); determining a sequence of voice recordings 
by determining a voice recording for each speech item based on the desired 
inflection for the particular speech item and based on the available voice 
recordings that correspond to the particular speech item {col. 9, In. 33-37); and 
generating voice data based on the sequence of voice recordings by 
concatenating adjacent recordings in the sequence of voice recordings {col. 9, In. 
51-56). Coorman et al. fail to specifically disclose the method wherein the 
playback rules dictate that the desired inflection for a glue item is based on the 
desired inflection for surrounding payload items and that the desired inflection for 
a payload item is based on the desired inflection for nearest payload items with 
priority being given to speech items having a greater pitch value such that the 
desired inflections are determined first for speech items having the greatest pitch 
value and, thereafter, are determined for speech items in order of descending 
pitch. Furthermore, it would have not been obvious to one of ordinary skill in the 
art at the time of invention to modify Coorman et al. by incorporating the teaching 
above. Therefore, claims 1 6-1 9 are allowed over prior art of record. 

Claim Rejections - 35 USC §112 

4. The following is a quotation of the first paragraph of 35 U.S.C. 1 12: 

The specification shall contain a written description of the invention, and of the manner and 
process of making and using it, in such full, clear, concise, and exact terms as to enable any 
person skilled in the art to which it pertains, or with which it is most neariy connected, to make 
and use the same and shall set forth the best mode contemplated by the inventor of carrying 
out his invention. 
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5. Claims 9-15 are rejected under 35 U.S.C. 112, first paragraph, as failing to 
comply with the enablement requirement. The claim(s) contains subject matter 
which was not described in the specification in such a way as to enable one 
skilled in the art to which it pertains, or with which it is most nearly connected, to 
make and/or use the invention. The specification discloses the pitch value 
between 1 and 5, but fails to indicate measurement unit associated with these 
values to enable one to understand. 

Claim Rejections - 35 USC § 102 

6. The following is a quotation of the appropriate paragraphs of 35 
U.S.C. 102 that form the basis for the rejections under this section made in this 
Office action: 

A person shall be entitled to a patent unless - (e) the invention was described in (1) an 
application for patent, published under section 122(b), by another filed in the United States 
before the invention by the applicant for patent or (2) a patent granted on an application for 
patent by another filed in the United States before the invention by the applicant for patent, 
except that an international application filed under the treaty defined in section 351(a) shall 
have the effects for purposes of this subsection of an application filed in the United States 
only if the international application designated the United States and was published under 
Article 21(2) of such treaty in the English language. 

7. Claims 1 and 8 are rejected under 35 U.S.C. 102(e) as being anticipated 
by Coorman et al. (US Patent No. 6665641). 

8. Regarding claim 1 , Coorman et al. disclose a method for converting text to 
concatenated voice by utilizing a digital voice library and a set of playback rules 
(col. 8, In. 59 to col. 9, In. 56), the digital voice library including a plurality of 
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speech items and a corresponding plurality of voice recordings wherein each 
speech item corresponds to at least one available voice recording wherein 
multiple voice recordings that correspond to a single speech item represent 
various inflections of that single speech item (col. 9, In. 1-8), the method 
including receiving text data, converting the text data into a sequence of speech 
items in accordance with the digital voice library (col. 9, In. 13-25), the method 
further comprising: 

determining a syllable count for each speech item in the sequence of 
speech items (col. 23, In. 35-45); 

determining an impact value for each speech item in the sequence of 
speech items (col. 9, In. 33-44 or referring to the COST FUNCTION sections on 
col. 12-15, the impact value is interpreted as how well the speech item fits in the 
concatenated speech); 

determining a desired inflection for each speech item in the sequence of 
speech items based on the syllable count and the impact value for the particular 
speech item and further based on the set of playback rules (col. 9, In. 26-37); 

determining a sequence of voice recordings by determining a voice 
recording for each speech item based on the desired inflection for the particular 
speech item and based on the available voice recordings that correspond to the 
particular speech item (col. 9, In. 33-37); and 

generating voice data based on the sequence of voice recordings by 
concatenating adjacent recordings in the sequence of voice recordings (col. 9, In. 
51-56). 
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9. Regarding claim 8, Coorman at al. further disclose that a plurality of 
speech items includes a plurality of words, the method further comprising: 

determining a pitch value for each speech item in the sequence of speech 
items by normalizing the impact value for the particular speech item (col. 10, In. 
49-55 of col. 13, In. 48-53), wherein the desired inflection for each speech item is 
further based on the pitch value for the particular speech item (col. 10, In. 43-53). 

Claim Rejections - 35 USC § 103 

10. The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for 
all obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described 
as set forth in section 102 of this title, if the differences between the subject matter sought to 
be patented and the prior art are such that the subject matter as a whole would have been 
obvious at the time the invention was made to a person having ordinary skill in the art to which 
said subject matter pertains. Patentability shall not be negatived by the manner in which the 
invention was made. 

1 1 . Claim 2 is rejected under 35 U.S.C. 103(a) as being unpatentable over 
Coorman et al. (US Patent No. 6665641) in view of Jacks et al. (US Patent No. 
4692941). 

12. Regarding claim 2, Coorman et al. fail to specifically disclose that the 
speech items are glue items and a plurality of the speech items are payload 
items, the method further comprising: 

setting a flag for any speech item in the sequence of speech items that is 
a glue item, wherein the playback rules dictate that the desired inflection for a 
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glue item is based on the desired inflection for surrounding payload items in the 
sequence of speech items and that the desired inflection for a payload item is 
based on the desired inflection for nearest payload items in the sequence of 
speech items. 

However, Jacks teach a method of setting a flag for any speech item in 
the sequence of speech items that is a glue item (col. 4, In. 48-50, the main point 
is to identify glue words), wherein the playback rules dictate that the desired 
inflection for a glue item is based on the desired inflection for surrounding 
payload items in the sequence of speech items and that the desired inflection for 
a payload item is based on the desired inflection for nearest payload items in the 
sequence of speech items (col. 9, In. 51 to col. 10, 27).The advantage of using 
the teaching of Jacks et al. in Coorman et al. is to analyze the structure of the 
sentence and assign appropriate prosody to each word to make the synthesized 
speech sound more naturally. 

Since the modified Coorman et al. and Jacks et al. are analogous art 
because they are from the same field of endeavors, it would have been obvious 
to one of ordinary skill in the art at the time the invention was made to modify 
Coorman et al. by incorporating the teaching of Jacks et al. in order to analyze 
the structure of the sentence and assign appropriate prosody to each word to 
make the synthesized speech sound more naturally. 



13. Claims 3-5 mil T' Hii are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Coorman et al. (US Patent No. 6665641) in view of Jacks et 
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al. (US Patent No. 4692941 ) and further in view of Minowa et al. (US Patent No. 
6438522). 

14. Regarding claims 3 iMS'VT. , the modified Coorman et al. fail to specifically 
disclose that a plurality of speech items includes a plurality of phrases. However, 
Minowa et al. teach that a plurality of speech items includes a plurality of phrases 
(col. 7, In. 6-10). The advantage of using the teaching of Minowa et al. in the 
modified Coorman et al. is to allow the system to process phrase input speech 
items. 

Since the modified Coorman et al. and Gasper et al. are analogous art 
because they are from the same field of endeavors, it would have been obvious 
to one of ordinary skill in the art at the time the invention was made to further 
modify Coorman et al. by incorporating the teaching of Gasper et al. in order to 
allow the system to process phrase input speech items. 

15. Regarding claims 4 the modified Coorman et al. fail to specifically 
disclose that a plurality of speech items includes a plurality of phrases. However, 
Minowa et al. teach that a plurality of speech items includes a plurality of words 
(col. 7, In. 6-10). The advantage of using the teaching of Minowa et al. in the 
modified Coorman et al. is to allow the system to process single word input 
speech items. 

Since the modified Coorman et al. and Gasper et al. are analogous art 
because they are from the same field of endeavors, it would have been obvious 
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to one of ordinary skill in the art at the time the invention was made to further 
modify Coorman et al. by incorporating the teaching of Gasper et al. in order to 
allow the system to process single word input speech items. 

16. Regarding claims 5 au#-@ s the modified Coorman et al. fail to specifically 
disclose that a plurality of speech items includes a plurality of syllables. 
However, Minowa et al. teach that a plurality of speech items includes a plurality 
of syllables (col. 7, In. 10-25). The advantage of using the teaching of Gasper et 
al. in the modified Coorman et al. is to increase processing speed by using 
syllable-based segmentation scheme to reduce the number of speech models. 

Since the modified Coorman et al. and Minowa et al. are analogous art 
because they are from the same field of endeavors, it would have been obvious 
to one of ordinary skill in the art at the time the invention was made to further 
modify Coorman et al. by incorporating the teaching of Minowa et al. in order to 
increase processing speed by using syllable-based segmentation scheme to 
reduce the number of speech models. 

17. Claim 6 is rejected under 35 U.S.C. 103(a) as being unpatentable over 
Coorman et al. (US Patent No. 6665641) in view of Gasper et al. (US Patent No. 
5278943). 



18. Regarding claim 6, Coorman et al. fail to specifically disclose that multiple 
voice recordings that correspond to a single speech item represent various 
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inflections of that single speech item and wherein the various inflections belong 
to various inflection groups including a at least one standard inflection group, at 
least one emphatic inflection group, and at least one question inflection group. 
However, Gasper et al. suggest that stored recordings having different prosodic 
environments (col. 13, In. 18-29). Thus, it would have been obvious to one of 
ordinary skill in the art at the time the invention was made to modify Coorman et 
al. by specifically making records of these different inflections to provide the 
digital library a wide range of speech variations of particular words to enhance 
speech synthesis capabilities and increase system's reliabilities. 



19. Claim 7 is rejected under 35 U.S.C. 103(a) as being unpatentable over 
Coorman et al. (US Patent No. 6665641) in view of Gasper et al. (US Patent No. 
5278943) and further in view of Jacks et al. (US Patent No. 4692941). 

20. Regarding claim 7, the modified Coorman et al. fail to specifically disclose 
that at least one question inflection group includes a single word question 
inflection group and a multiple word question inflection group. However, Jacks et 
al. teach that at least one question inflection group includes a single word 
question inflection group and a multiple word question inflection group (col. 9, In. 
45-50). The advantage of using the teaching of Jacks et al. in Coorman et al. is 
to assign appropriate pitch to word(s) in a question to make the speech sound 
more naturally. 
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Since the modified Coorman et al. and Jacl<s et al. are analogous art 
because they are from the same field of endeavors, it would have been obvious 
to one of ordinary skill in the art at the time the invention was made to further 
modify Coorman et al. by incorporating the teaching of Jacks et al. in order to 
assign appropriate pitch to word(s) in a question to make the speech sound more 
naturally. 
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Conclusion 

Any inquiry concerning this communication or earlier communications from 
the examiner should be directed to Huyen Vo whose telephone number is 703- 
305-8665 and email address is huven.vo@uspto.gov . The examiner can 
normally be reached on M-F, 9-5:30, 

If attempts to reach the examiner by telephone are unsuccessful, the 
examiner's supervisor, Doris To can be reached on 703-305-4827. The fax 
phone number for the organization where this application or proceeding is 
assigned is 703-872-9306. 

Information regarding the status of an application may be obtained from 
the Patent Application Information Retrieval (PAIR) system. Status information 
for published applications may be obtained from either Private PAIR or Public 
PAIR. Status information for unpublished applications is available through 
Private PAIR only. For more information about the PAIR system, see http://pair- 
direct.uspto.gov. Should you have questions on access to the Private PAIR 
system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll- 
free). 




